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Abstract 

In this paper, a cross-layer framework to jointly optimize spectrum sensing and scheduling in 
resource constrained agile wireless networks is presented. A network of secondary users (SUs) accesses 
portions of the spectrum left unused by a network of licensed primary users (PUs). A central controller 
(CC) schedules the traffic of the SUs, based on distributed compressed measurements collected by the 
SUs. Sensing and scheduling are jointly controlled to maximize the SU throughput, with constraints on 
PU throughput degradation and SU cost. The sparsity in the spectrum dynamics is exploited; leveraging 
a prior spectrum occupancy estimate, the CC needs to estimate only a residual uncertainty vector via 
sparse recovery techniques. The high complexity entailed by the POMDP formulation is reduced by a 
low-dimensional belief representation via minimization of the Kullback-Leibler divergence. It is proved 
that the optimization of sensing and scheduling can be decoupled. A partially myopic scheduling strategy 
is proposed for which structural properties can be proved showing that the myopic scheme allocates SU 
traffic to likely idle spectral bands. Simulation results show that this framework balances optimally the 
resources between spectrum sensing and data transmission. This framework defines sensing-scheduling 
schemes most informative for network control, yielding energy efficient resource utilization. 

I. Introduction 

The recent proliferation of mobile devices has been exponential in number as well as hetero¬ 
geneity Q. As mobile data traffic is expected to grow 13-fold, and machine-to-machine traffic 
will experience a 24-fold increase from 2012 to 2017 [0, tools for the design and optimization 
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of agile wireless networks is of signifieant interest Q. Furthermore, network design needs to 
explieitly eonsider the resouree eonstraints typieal of wireless systems. These resouree eonstraints 
will impaet the aequisition of network state information, whieh is essential for network eontrol. 

In this paper, we eonsider a wireless network eomposed of a lieensed network of primary 
users (PUs) dynamieally aeeessing a speetrum with F frequeney bands, and an agile network 
of seeondary users (SUs) whieh opportunistieally attempt to aeeess the portion of the speetrum 
left unused by the PUs Q. The speetrum oeeupaney is inferred by a eentral eontroller (CC), by 
aggregating eompressed speetrum measurements eolleeted in a distributed fashion by the SUs, 
and by overhearing feedbaek signaling from the PUs. Aeeordingly, the CC alloeates the traffie 
of the SUs aeross the speetrum bands. Joint sensing-seheduling polieies are defined, so as to 
maximize the SU throughput, under eonstraints on the throughput degradation eaused to the PUs 
and on the sensing-transmission eost ineurred by the SUs. 

The eontributions of this paper are as follows. We propose a framework whieh eaptures the 
interplay between sensing and scheduling, by trading off the eost of aequisition of network 
state information and the overall network performanee. Spectrum sensing is done by eolleeting 
compressed speetrum measurements from distributed SUs and loeal feedbaek at the CC. based 
on it, spectrum scheduling deeisions are done. This is in eontrast to standard formulations 
based on partially observable Markov deeision proeesses (POMDPs) Q, where observations are 
passively generated by eontrol aetions, rather than actively controlled via sensing. We provide 
a motivational example for the ease of a single speetrum band and noiseless sensing in See. |I^ 
whieh highlights the need for adaptivity in a eross-layer and resouree eonstrained environment, 
and then extend the model to the general ease. For the general ease, in See. |V| we show that 
the joint sensing-seheduling poliey ean be optimized via dynamie programming (DP); we prove 
the optimality of a two-stage deeomposition, whieh exploits the suffieient statisties that drive the 
deeision making proeess (Theorem [^, and allows one to deeouple the optimization of sensing 
and seheduling (Algorithm [^. Additionally, in order to reduee the huge aetion spaee in the 
speetrum seheduling phase, we propose a partially myopic seheduling seheme, where the total 
traffie of the SUs is determined optimally via DP, whereas the alloeation of the resulting total 
budget aeross frequeney bands is determined via a myopie maximization of the instantaneous 
trade-off between PU and SU throughput. We prove struetural properties of the partially myopie 
seheduling seheme, showing that it effeetively alloeates the SU traffie to the speetrum bands 
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more likely to be idle, thus minimizing interferenee to the PUs and maximizing SU throughput, 
and that it ean be solved effieiently using standard eonvex optimization tools (Theorem [^. 

In order to taekle the high eomplexity of the DP algorithm Q, in See. |V^ we propose 
eomplexity reduetion teehniques. We employ a eompaet state spaee representation by projeet- 
ing the belief onto a low-dimensional manifold via the minimization of the Kullbaek-Leibler 
divergenee (KLD, Theorem [^. Based on the compressed belief, we design adaptive eompressive 
sensing (CS) sehemes, whieh effeetively exploit the sparse network dynamies typieal of wireless 
networks. In the speetrum sensing eontext analyzed in this paper, only few PUs join or leave 
the speetrum at any time, so that the speetrum oeeupaney state exhibits sparse time variations. 
Therefore, leveraging the estimate of the speetrum oeeupaney state in the previous slot, only a 
sparse residual uneertainty veetor needs to be estimated, and few measurements suffiee to drive 
seheduling deeisions. Additionally, sueh representation allows us to design a state estimator based 
on sparse reeovery algorithms. Although the foeus of this paper is on speetrum sensing in agile 
wireless networks, this framework ean be generalized to more general networked systems, where 
the state of the system is a eolleetion of features, rather than speetrum bands {e.g., buffer state 
of all wireless nodes, or loeal ehannel quality), whieh evolve sparsely over time. These state 
features ean be traeked by eolleeting a few eompressed measurements via distributed sensing, 
enabling more informed network eontrol. 

A. Related work 


There is signifieant prior work on eognitive radio and eompressed sensing (CS); we have 
foeused on the literature that is most relevant to our eurrent problem framework. Centralized 
sehemes for the traeking of sparse time-varying proeesses have been examined in Q-Q and 
distributed CS has been studied in 0, [[Tg for static signals. In eontrast to these two veins, 
we study distributed CS for time-varying signals. Performanee guarantees for reeursive reeon- 
struetion of sparse signals under the assumption of slow support ehanges is studied in pT| ; 
however, joint sensing and eontrol is not examined. Reeovery of statie binary sparse signals via 
CS has been investigated in p^ , pA| |. Compressive speetrum sensing has been studied in 0, 


for a statie setting, and in [14|, for a dynamie setting with noiseless measurements, but without 
seheduling. We do not foeus on reeovery guarantees herein, but rather embed CS into a eontrol 
framework wherein the number of measurements is adapted based on prior information in order 
to drive traffie seheduling deeisions. 
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Active sensor seheduling and adaptation p3| eneompass applieations sueh as target traeking 
p7| , physieal aetivity deteetion p^ , and sequential hypothesis testing p9| . All these prior 


works ineluding ours |20|-[22| assume that the underlying state is given by nature and is not 
eontrolled. In eontrast, in this work, states are affeeted by seheduling deeisions, via interferenee 
and eollisions generated by the SUs to the PUs, and we design joint controlled sensing, estimation 
and seheduling sehemes in wireless networks, whieh aeeount for the eost of aequisition of state 
information and its impaet on the overall network performanee. 

Complexity reduetion of POMDPs via exponential family prineipal eomponents analysis en¬ 
ables planning on a small dimensional manifold in Model reduetion of eomplex Markov 


ehain models using the KLD as a metrie is investigated in [24|. In eontrast, we develop a belief 
eompression method based on Neyman-Pearson formulation of the eompressive speetrum sensing 
problem. Our seheme eaptures relevant features of the dynamie speetrum aeeess problem, without 


having to learn key statisties. As in [24|, the KLD measure is also used to projeet the true belief 
onto the low-dimensional manifold. 

In this work, we assume that the PUs employ a retransmission proeess, whieh induees strueture 


in the PU signal. This strueture has been exploited in p5[ | to design adaptive SU aeeess teeh- 
niques, and in 


to design smart interferenee eaneellation teehniques that exploit redundaney 
introdueed by retransmission. In this work, instead, we exploit the strueture in the signal as 
a result of sparse network dynamies, to design eompressive speetrum sensing teehniques and 


sparse reeovery sehemes. We extend the model studied in [271 to inelude a more general traffie 
model for the SU network, and propose a low-eomplexity solution based on the aforementioned 
partially myopie seheduling seheme. 

This paper is organized as follows. In Seo.|^ we provide an example whieh motivates the need 
for adaptivity in a eross-layer and resouree eonstrained environment, for the ease with a single 


frequeney band and noiseless sensing. In See. Ill we present the system model for the general 


ease with multiple frequeney bands and noisy sensing. In See. IV, we present the optimization 
problem and, in See.|V[ the proposed optimization teehniques. In See.|V^ we present teehniques 
for the eomplexity reduetion based on belief approximation via KLD minimization and sparse 
reeovery algorithms. In See. |VII[ we present numerieal results, and, in See. VIII we eonelude 
the paper. The proofs of the analytieal results are provided in the Appendix. 
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Licensed 



Figure 1. Licensed network of PUs and opportunistic network of SUs. The SU-CC receives spectrum measurements and 
controls the SU network accordingly. SU transmissions generate interference to the PU network. 

II. Motivation: single frequency band and noiseless sensing 

In this section, we provide an example which motivates the need for adaptivity in a cross-layer 
and resource constrained environment, by comparing the performance achieved by non-adaptive 
sensing strategies (Sec |II-A[ ), with that achieved by adaptive schemes (Sec ]II-B[ ). In particular, we 
focus on the special case of a single frequency band and noiseless sensing. Consider a network 
of Ns SUs with sensing capability, which attempt to access a licensed channel (single frequency 
band), represented in Fig. Herein, for mathematical convenience, we use the approximation 
Ns^oo to derive the transition probabilities and performance of the system. The following 
discussion can be generalized to Ns<oo. The channel occupancy state in slot k is denoted as 
bk G {0,1}, where 6^ = 0 if the channel is idle and bk = I ii it is occupied by a PU. 

The SUs opportunistically access the spectrum based on the traffic decision broadcasted 
by the CC. Given r^, each SU, assumed to be backlogged, transmits data independently with 
probability = r^/Ns, incurring the transmission cost ctx', otherwise, the SU remains idle, 
incurring no cost. We employ a collision channel model, i.e., if more than one terminal (either 
SUs or PUs) transmits on the same channel, those packets cannot be decoded correctly at the 
corresponding receiver and are lost. Otherwise, if one and only one user transmits, then the 
transmission is successful with probability 1—ps (for the SU) and 1—pp (for the PU). This 
collision model represents a worst-case scenario, and thus provides performance guarantees. 


The value = 1 maximizes the throughput for the SUs [28|, and any larger value > 1 
degrades both the PU and SU throughputs, and incurs larger energy cost for the SUs. We thus 
restrict to take values in [0,1]. 

The success probability for the PUs as a function of bk and is given by 


Plul {h, Tk) = bk{l- pp)e 


( 1 ) 
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where the probability of no eollisions from the SUs satisfies (1—)-e for Ns^oo. 
Similarly, the probability of sueeessful transmission for the SU system is given by 

Psul ih, Tk) = (1 - bk){l - Ps)rke~''\ (2) 

where the probability that one and only one SU transmits satisfies Vk (1 — Tk/Ns)^^~^ —>■ rke~^'^, 
and, if the ehannel is oeeupied by one PU, the transmission fails due to eollisions 

The PUs implement a retransmission meehanism in ease of transmission failure. Retransmis¬ 
sions are performed in the same ehannel, in the next slot. If the transmission is sueeessful, 
then the PU oeeupying the ehannel either has a new data paeket to transmit in the next slot, 
with probability 9, or leaves the speetrum idle. An idle ehannel is oeeupied by a new PU 
with probability ( G (0,1) and it remains idle otherwise. Therefore, the state bk G {0,1} is 
a two-state Markov ehain, whose transition probabilities depend on the alloeated SU traffie, 
Tk G [0,1]. The transition probability from state bk=b to bk+i=b', given rk=r, denoted as 
Pg(6'|6,r)=P(6fc+i=6'|6fc=6,rfc=r), is given by 

Pfl(l|(),r) = (1 - b)(+ [(>-(l-«)(l-C)Pi£(6,r)] 


and Pb( 0|6, r) = 1 — Pb( 1|6,r). In faet, the ehannel is oeeupied in the next slot if and only if 
one of the following events oeeur: the PU transmits sueeessfully and it has a new data paeket 
to transmit, with probability 6; a new PU arrives, with probability C; or the transmission of the 
PU is unsueeessful and thus a retransmission is required. 

Remark 1 The retransmission protocol implemented by the PUs can also be exploited by lever¬ 
aging the redundancy of the retransmission process, using a technique termed ehain deeoding 


[26] to remove the interference of the PU signal over the retransmission windows of the PU. In 
this paper, we assume slot-by-slot decoding, so that the redundancy of the PU retransmission 
protocol is not exploited for interference cancellation. The extension is left for future work. 


A. Non-adaptive spectrum sensing 

The SU traffie is seheduled based on speetrum measurements eolleeted by the SUs in a 
distributed fashion. Consider a seheme where the SUs eolleet and report to the CC noiseless 


'Note that the analysis under the asymptotic assumption Ns^oo yields a good approximation even when Ns is finite, e.g.. 
Ns — 10. For instance, if pp = 0 and rs, = 1 in |^, or ps = 0 in ([^, we obtain Psuf (1,1) = Psulc (0,1) ~ 0.368 in the 
asymptotic case and Pauf (1,1) = Psulc (0,1) ~ 0.349 when Ns = 10. 
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spectrum measurements at the beginning of slot k, with probability a = ijj/Ns independently in 
each slot, incurring the sensing-transmission cost cs, and they remain idle otherwise, incurring 
no cost. The parameter e [0,1] denotes the average SU sensing traffic. The SUs share a control 
channel to report their measurements, resulting in packet losses if more than one SU transmits 
on the same channel. The probability that the CC collects the spectrum measurement is thus 
given by ps = Nsa{l — —)■ (for Ns —)■ oo). 

Assume that the SUs are not allowed to cause any degradation to the PU system. Then, the 
SU traffic is = r G [0,1] in those slots where the channel is detected by the CC to be 
idle, otherwise no traffic is allowed (in order to not interfere with the PUs). In particular, if no 
measurement is collected, no SU transmissions are allowed, due to the uncertainty in the current 
channel state. The average long-term sensing and transmission cost incurred by the SUs, and 
the SU and PU throughputs are given by 


Csensing ■} ^') 

TsifJN') = (1 - Ps)7rp(0)7/>e"’^re“’’, 


= 7rp(0)7/>e (3) 

fp(7/>, r) = (1 - pp)7rp(l), (4) 


where 7rp(0) and 7rp(l) are, respectively, the steady-state probabilities of the channel being idle 
and occupied, given by 

Pp(0|l,0) (l_p)(l_^)(l_p^) 


7rp(0) = 


(5) 


Pp(0|l, 0) + Pp(l|0, r) (1 - 0)(1 - 0(1 - Pp) + C 

and 7rp(l) = 1—7rp(0). In fact, sensing is done independently in each slot, incurring the expected 
sensing cost ipcs- If the measurement is received successfully (with probability and the 

channel is detected to be idle (with steady-state probability 7rp(0)), then the data transmission 
cost rcTx is incurred in the scheduling phase, and the expected throughput achieved is re“’’. 
We want to determine ('^*,r*) such that 


{ip*^r*) = argmaxT5('^, r) s.t. Cpip^r) < U” 

'ip,r 


( 6 ) 


where we have defined the sensing-transmission cost C{p),r) = Csensingi'pN) + Cschedi'pN), 
and < Cs -f TTp{0)e~^CTx (achieved with ijj = r = 1). The above optimization problem 

allows us to define the joint sensing-scheduling strategy that balances optimally between the 
cost of acquisition of state information via distributed sensing and the overall network goal of 
maximizing the SU throughput and, at the same time, avoiding interference to the PUs. 
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Since both C{'ip,r) and Tsii/j^r) are increasing functions of G [0,1] and r e [0,1], under 
the optimal strategy we have C{il)*^r*) = yielding the optimal r as a function of 


r{ip) = 


Cn 


%Ijcs 


where ip* < min 


C" 


cs 


np{Q)ipCTx 

, 1 (>. Hence ip* can be determined as 


(7) 


Ip* = argmax Ts{ip,r{ip)), 


( 8 ) 


by exhaustive search. When cs, hence ^ 1, we approximate ~ 1, thus obtaining 

(jmax _ 

Ts{ip,r{ip)) ~ (1 — ps) --e = 

CtX 

which represents an upper bound to Ts{ip,r{ip)) for the general case. This upper bound can be 
optimized in closed form, yielding the upper bound optimizing ip* and r*. 


T^^^\iP,r{iP)), (9) 


Ip* = min < 
r* = r{ip*) 


2C'^^^/cs 

1 + a/ 1 + 47rp(0)cTx/cs 
7rp{0)ip*CTx ^ 



( 10 ) 

( 11 ) 


B. Adaptive spectrum sensing-scheduling 

The above non-adaptive sensing strategy does not provide the best performance possible due 
to its static nature. Indeed, it may be beneficial to adapt the sensing strategy over time, i.e., by 
selectively sensing the channel state based on the prior channel information, in order to make 
the best use of the scarce resources available to the SUs. We now demonstrate the importance 
of using adaptive sensing schemes to optimize the performance of the system, as a means to 
effectively cope with the cost of acquisition of state information for network control. 

Thus, we consider the scenario where the sensing traffic ip is adapted over time. We denote 
the belief state at the CC as (&, r), where r > 0 denotes the number of slots since the last 
measurement was collected, and b G {0,1} denotes the last channel state detected. For instance, 
bk = 0, Tk = I denotes that the spectrum was detected as idle in slot k — 1. Let ip{b, r) G [0,1] be 
the sensing traffic, i.e., the expected number of measurements collected by the network of SUs, 
when the state is (&, r), so that the probability that a measurement is successfully collected is 
given by psib, r) = ip{b, We denote the prior steady-state distribution (before sensing) 

that the belief is (6, r) as x{b, r); similarly, we denote the posterior steady-state distribution (after 
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sensing) that the belief is (&, r) as 7r(6, r). The steady-state equations relating the prior to the 
posterior steady-state probabilities are given by 

fr(b, t) = 7r(6, r)(l - ps{b, r)), r > 0, (12) 

OO 

^(^0)= ^7r(6',r)ps(6',r)p(^)(6|6'), (13) 

b'e{o,i} ^=1 

where F^'^\b'\b) is the r-step probability of transition of the ehannel from state b' to state b. In 
fact, the posterior belief (6, r) for r>0 is reached from the prior belief (6, r) if no measurement 
is successfully collected at the CC. On the other hand, the posterior belief (6,0) is reached if 
the measurement is collected at the CC and the channel state b is detected. 

Similarly, the steady-state equations relating the posterior to the prior steady-state probability 
in the next slot are given by 

7r(6, r) = 7r(6, r — 1), V6 G {0,1}, Vr > 1. (14) 


In fact, since we are moving to the next slot, the information about the last state detected becomes 


outdated by one more slot. By solving the system of equations (12 14), we obtain 

T—1 

ti I 1 

7r(l,r) = 


/(I) 


7r(0,r) = 


Ebe{o,i} /(^) 1 n[=i (1 - Ps{b, i)) tl 

/(O) 


^oo rTT—1/ 


n(l -P5(l,*)), r > 1, 

2=1 

(15) 

r—1 

JJ(1 -P5(0,i)), r > 1, 

2=1 

(16) 


where we have defined 

= E 11(1 - ps{l - 6, t))psil - b, r)pW(6|l - b). 

We thus obtain 


OO r—1 


T=12=1 


C'(V’,r) = E E 7r{b, T)'ijj{b, t)cs + :^(0, 0)rcrx, 
be{o,i} T=i 

Tsi'ip, r) = {l- ps)H0, 0)re-\ 


(17) 

(18) 
(19) 


where we have used the fact that data transmission occurs only when the spectrum is detected 
to be idle (state (0, 0)), with cost tctx and instantaneous throughput (1 — ps)re~^. 

The goal is to define jointly the sensing-scheduling policy solving (|^. The optimal 

policy can be determined via dynamic programming. For simplicity and for the sake of exposition. 
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(a) SU throughput versus total cost of sensing-scheduling, (b) Ratio between sensing cost and total cost versus total 

cost of sensing-scheduling. 

Figure 2. 

here we evaluate the performanee of an heuristie adaptive sensing poliey sueh that r) = '0(&), 
henee r) = ps(fe), i.e., the sensing probability is only adapted to the value of the last state 
detected, rather than the delay parameter r. In this case, we obtain 

“ Psib)y~\ r > 1, 6 e {0,1}, (20) 

ps(0) ps(l) 

and therefore 


C{y, r) = cs [vr(0, + 7r(l, + 7r(0, 0)rcTx- 

By optimizing numerically the SU throughput Ts{y, r) with respect to we obtain 

the plot in Fig. [^a, where we also plot the non-adaptive sensing policy (unless otherwise stated, 
the parameters are given as in Sec. VII| ). We observe that the adaptive scheme achieves twice as 
much SU throughput as the non-adaptive one, for low values of the cost budget; the lower cost 
budget is typical for wireless systems. In Fig. [^b, we plot the ratio between the sensing cost 
Csensing and the total budget For both schemes, more than 65% of the resources is spent 

for sensing, and consequently less than 35% is used for SU data transmission. 

This example demonstrates the importance of taking into account the cost of acquisition of 
state information for network control, and the importance of using adaptive sensing schemes to 
optimize the performance of the system. In the next section, we investigate the more general 
case with noisy compressed measurements collected by the SUs, F > 1 frequency bands, B > 1 
control channels employed by the SUs to report their measurements, and feedback from the PUs. 
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III. System Model: multiple lrequency bands and noisy sensing 


In this section, we extend the model considered in the previous section to the more practical 
setting with multiple frequency bands and noisy sensing. These factors introduce two difficulties 
in the problem: 1) due to the potentially large number of spectrum bands that need to be measured, 
the SUs would incur a significant cost to sense each spectrum band independently; in order to 
reduce this cost, we employ compressive spectrum sensing, where each SU collects a compressed 
measurement of the spectrum and transmits it to the CC, thus incurring only a fraction of the 
cost; this technique, in turns, complicates the design of the estimator and controller at the CC. 
2) Due to the noise in the spectrum measurements, there is always some residual uncertainty in 
the current estimate, thus zero-interference operation is not possible (unless the SUs remain idle 
all the time); additionally, the accuracy of the spectrum estimate will depend on the number of 
measurements received at the CC, so that, the more the measurements, the better the estimate. 
This factor introduces a requirement that B > 1, i.e., multiple control channels should be 
employed by the SUs to report their measurements, as opposed to the noiseless case, where 
one measurement suffices, and thus i? = 1. In Secs. |III-A| and IIITBj we introduce the models 
of spectrum scheduling and (compressed) spectrum sensing, respectively, and, in Sec. |III-C[ we 
characterize the dynamics of the system. 

We consider a licensed spectrum composed of F frequency bands, represented in Fig. Let 
bfc = (bfc^, bfc 2 , • • •, bfc i?)^ be the F-dimensional spectrum occupancy (column) vector at time 
k, where ^ denotes matrix transpose, and hk^i G {0,1} is the occupancy state of the i\h band. 


The system is time-slotted with slot duration 1 and operates in two phases [29|: a sensing 
phase, of duration d, during which the SUs collect compressed distributed measurements of the 


spectrum occupancy state and report them to a CC (e.g., a base station) (Sec. |III-B[ ); followed 
by a scheduling phase, of duration 1 — d, where the SUs access the spectrum based on the 


scheduling decision of the CC (Sec. III-A). 
A. Spectrum Scheduling 


In the spectrum scheduling phase, the SUs opportunistically access the spectrum based on the 
traffic vector decision broadcasted by the CC, where rfc=(rfc i, • ■ ■ ,rfc^^)^e[0,1]^, and 
Vk,i is the average SU traffic in the Ah spectrum band at time k. In each spectrum band, the 
dynamics of the PU system evolve as described in Sec. 
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We define the aggregate expeeted instantaneous throughput for the SU and PU systems, 
respeetively, given and rfc, as 

F 

Tx(b,,r,) = (b,„r,,), X e {S,P}. (21) 

i=l 


B. Spectrum Sensing 

At the beginning of slot k, the speetrum oeeupaney b^ is inferred by eolleeting noisy eom- 
pressed speetrum measurements by the SUsj^ aeeording to the observation model (for SU j) 


yk,j = afcjbfc + nfcj, Vj = 1, 2,..., Ns, 


( 22 ) 


where nfcj~7V(0, cT|)[^is Gaussian noise, i.i.d. over time and aeross SUs, is the measurement 
veetor, and the superseript ”T” denotes the matrix or veetor transpose. Eq. (22) is the result of 
filtering over the speetrum band, so that j denotes the filtering eoeffieient veetor, whieh in- 
eludes also the signal attenuation between the PU and the SU. We assume that afcj~7V(0, 
where I„ is the n x n identity matrix, and is known to the CC. 

Remark 2 Note that each SU can, in principle, estimate the spectrum occupancy state b^ 
based only on local measurements However, if F is large, or the measurement is very noisy 
i^’z !^ 1A estimation accuracy may be very poor. In contrast, by collecting measurements 
from a large number of SUs, the CC can estimate b^. more accurately. 


The SUs share B orthogonal eontrol ehannels to report their measurements, resulting in paeket 
losses if more than one SU transmit on the same ehannel. The SU sensing traffic in eaeh eontrol 
ehannel is fk in slot k, whose value is broadeasted by the CC to the SUs at the beginning 
of slot k, so that the SUs aetivate with eommon probability = fkB/Ns, and transmit their 
measurement in one of the B ehannels available, ineurring the sensing-transmission eost cs- No 
eost is ineurred by staying inaetive. 

We denote the set of SUs that aetivate to sense and report their measurement as Ak with 
eardinality Ak, and the set of SUs that report suoeessfully their measurement to the CC as M.k 


^We assume that the measurements are collected by the SUs. However, the analysis can be extended to the case where the 
sensors collecting the measurements and the SUs performing spectrum access do not coincide. 

^For simplicity and without loss of generality, we consider real-valued quantities. The following framework and analysis can 
be extended to complex-valued ones. 
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Figure 3. Block diagram of the system dynamics. 


with cardinality M^. We define the probability that = m measurements are successfully 
reeeived at the CC, given that Ak = a SUs activate, as PM\Ai'm\a) = ¥{Mk = m\Ak = a). 
Moreover, we define the probability that Mk = m measurements are successfully reeeived at 
the CC, given the sensing traffic as = EA^[pM\Aim\Ak)\Mk = m,7pk], by taking 

the expectation with respect to Ak ~ B{Ns, ak). Assuming a collision model for the B eontrol 
channels and Ns —)■ oo, the number of measurements reeeived at the CC, Mk, has binomial 
distribution with B trials and suecess probability [20|, i.e.. 


PM{rn\iik)='^{Mk=m\iik)=\ (V’fcS 


B—m 


1 m, 


so that the expected number of measurements reeeived is W[Mk\ipk\ = B%jjke~'^^. In the following 
treatment, we use this approximation, although the model ean be extended to finite Ns and more 
general channel models, by defining PMimli’k) aeeordingly. 


Let Yk € be the veetor of compressed measurements eolleeted in slot k. From (22), 

Yk = Afcbfc + rifc, (23) 


where A^ = [^k,j]jeMk *^he measurement matrix, known to the CC, and rij, = [zkj]j^Mk *^he 
noise column vector. Note that the size of Yk, Mk, is random, due to the probabilistic activation 
decision of each SU and packet losses resulting from the shared wireless control channels. 

C. System dynamics 

The dynamies of the system in eaeh slot ean be summarized as follows (see also Fig. |^: 
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1) Sensing phase: At the beginning of slot k, the sensing traffie is seleeted by the CC, and 
broadeasted to the SUs; eaeh SU eolleets a eompressed measurement with probability ak = 
'ipkB/Ns and transmits it independently in one of the B eontrol ehannels; 

2) Measurement collection: The measurement veetor y^. is eollected at the 

CC, where ~ PM{Mk\il)k)\ 

3) Scheduling phase: the traffie veetor G [0,1]^ is ehosen by the CC and broadeasted to the 
SUs; eaeh SU transmits its own data with probability * = Vk^i/Ns in the Ah band; 

4) State dynamics: state transitions with probability PB{hk+i,i\^k,i-i ^k,i) in the Ah speetrum band. 
We denote the prior belief that bfc=b, based on the history eolleeted up to time k, denoted as 

Pk, and before the sensing phase, as 7rfc(b)=P(bfc=b|'Hfc). Similarly, we denote the posterior 
belief that bfc=b, given {'Hk,yk,Ak), as 7rfc(b)=P(bfc=b|'Hfc, y^, A^). Using we have that 

1 ,, 


7ffc(b) oc 7rfc(b)exp 


24 




Alh 


(24) 


where oc denotes propoAionality up to a normalization faetor, so that we ean write -kk = 
n(7rfc, Afc,yfc), for a proper function ri(-). 

The CC, at the end of the slot, may overhear the PU acknowledgments of correct (ACK) 
or incorrect (NACK) reception of the packets, fed back by the PU receivers on each channel, 
denoted as pk,i k {ACK, NACK, 0}, where = 0 if either an erasure occurs (the ACK/NACK 
message cannot be detected by the CC) or the Ah band was idle, so that no feedback information 
is repoAed. We denote the erasure probability as e G [0,1], and the feedback vector collected at 
the CC at the end of slot k as p^.. Given b^ j and j, the probability mass function (pmf) of 
Pfc,i, denoted as Pp(p|6, r) = P (p^,* = p\hk,i = b, Vk,i = r), is given by 

Fp(0|6,r) = l-6 + 6e, Pp(ACK|6. r) = (1 - ()pl2(6, r). (25) 

Pp(NACK|(), r) = (!-€)(()- Pi2(b, r)). (26) 

where x(-) is the indicator function. Therefore, the pmf of p^ given b^. and is given by 

P (pfc = p|bfc,i = b, rk,i = r) = Pp(pi|bi, Fj). (27) 

i 

Given and p^, the CC updates the next prior belief as 


7rfc+i(b)=^ 7rfc(b) PB\p{hi\hi, Vk^i, Pk,i), 


(28) 


i=l 
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where PB\p{b'\b,r,p)=F(hk+i,i=b'\hk,i=b,rk,i=r,Pk,i=p), given by 

PB|p(l|l,r,ACK) = 0 + (l-0)C, PB|p(l|l,r,NACK) = 1, (29) 

PB|p(l|t,r,0) = P«(i,r)£(e + (1 - «)C) + (6 - Pi2(6,r))£ + (1 - 6)C, (30) 

and Pb|p( 0|6, r,p)=l—P b|p( 1|6, r,p). Note that ACK/NACK reeeption implies bfc^j=l. 

We ean thus write 7rfc+i=n(^fc, r^, p^), for a proper funetion n(-). 

IV. Policy definition and Optimization Problem 


In this seetion, we present the speetrum sensing and seheduling polieies (See. IV-A), and 


we introduee the performanee metries and the optimization problem (See. |IV-B| ). Complexity 
reduetion teehniques will be earned out in the following Sees. |V] and |V^ 

A. Spectrum Sensing and Scheduling policies 

In the sensing phase, given T-ik, the CC ehoses xjjk aeeording to a sensing poliey 'ipk='4^{'lik)- 
In the seheduling phase, given 'Pfc=('Pfc, y^, A^), the CC seleets Tk aeeording to a seheduling 
poliey rfc=r('Hfc). We denote the joint sensing-scheduling poliey as {jj-iv). 

B. Performance metrics and Optimization problem 

We define the average long-term sensing and data transmission eost of the SU network as 

'D-l 


a 


sensmoi'f^ J')= Hm —E 

sensingKH^, J ^ 


1 


Csched{f,r)= lim —E 
D^oo U 


fkBcs 

k=Q 

D-l F 


TTo 


CtX 


TTo 


(31) 


(32) 


. k=0 i=l 

respeetively, where tto is the initial prior belief at the beginning of slot 0. We define the total 
eost of sensing and data transmission as 


C(lpy r) Csensing y Cschedi^f y P)- 


Finally, we define the average SU/PU throughputs as 

'D-l 


Txifyv) = lim —E 

D—(-oo U 


TxO^ky Pfc) 


k=0 


TTO 


, Xe{S,P}, 


(33) 


(34) 


where the expeetation is with respeet to the realization of {b^, A^, y^, r^, p^}, indueed by (i/j, r). 
The goal is to determine the joint sensing-seheduling poliey {ijj*, r*) sueh that 


(V’*,r*) = argmaxT 5 ('^, r) s.t. C('^,r) < Tp{'ilj,r) >T^ 

(V’,r) 


min 
P •> 


(35) 
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where is the maximum eost of sensing and data transmission, and Tp™ is the minimum 
PU throughput requirement. Alternatively, we eonsider the Lagrangian formulation 


= argmax iTs{i),Y) + {I - CjTp{i),Y) - AC'(?/^,r), (36) 

(V>,r) 

where the parameters A>0 and .^€(0,1) eapture the desired trade-off between aehieving high 
PU/SU throughputs and ineurring low eost for data transmission and aequisition of state infor¬ 
mation at the CC. We have the following theorem. 

Theorem 1 The prior belief Hk is a sufficient statistic to choose the sensing action fk in slot 


k. The posterior belief itk is a sufficient statistic to choose the traffic in slot k. 

Proof: See Appendix A. ■ 

We ean thus restriet the design to stationary polieies of the form ifk = and = r(^fc) 

whieh depend solely on the respeetive suffieient statistie, so that we ean rewrite 


C sensingiffi 1 y') Bcg lim 

D—>-oo U 


D-1 


Cschedif, r) = ctx lim —E 

D^oo U 


k=0 

D-1 F 


TTo 


_ k=0 i=l 




Tx{f,Y) = lim —E 
D —>-00 U 


D-1 


EE 7rfc(b)Tx(b,r(7rfc)) 


k=0 b 


TTo 


where the expeetation is taken with respeet to the sequenee {vr^, k > 0}, indueed by {ijj, r). 


V. Optimization techniques 


In this seetion, we develop optimization teehniques to solve the optimization problem 
with lower eomplexity. In partieular, in See. V-A[ we first introduee the optimal DP algorithm, 
whieh exploits Theorem to deeouple the optimization of sensing and seheduling, and diseuss 


its enormous eomplexity. Then, in See. V-B we present our proposed partially-myopie sehedul¬ 
ing seheme, whieh enables eomplexity reduetion in the DP optimization. Due to the POMDP 


formulation, in See. VI we will resort to belief approximation based on KLD minimization, 
whieh enables the use of sparse reeovery teehniques to estimate the speetrum oeeupaney. 

A. Optimal DP algorithm: decoupling the optimization of sensing and scheduling 


The optimal solution of ( [3^ ean by found via DP. In partieular, we ean exploit Theorem 
to deeouple the DP algorithm into two sub-stages, whieh exploit the different suffieient statistie 
used in the speetrum sensing and seheduling phases, respeetively. 
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Algorithm 1 (Optimal sensing-scheduling DP) 1) Initialize = 0, V^; I = 1; 

2) Scheduling optimization stage; in the Ith iteration, determine, \/t^. 
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(^) = max ^ ^(b) {^Ts(h, r) + (1 - OTp{h, r) 
rG[0,l]^ , 

b 

-XcTxl^r + E (n {tt, r, p)) | b, r] } , 


(37) 


where the expectation is with respect to the realization of p, conditioned on b and r; the 
maximizer is the optimal SU data traffic in the Ith stage, 

3) Sensing evaluation stage; in the Ith iteration, determine, Vvr, Vm G {0,1,..., i?}, 


(/W(7r) = ^7r(b)E (fr 


b, m 


where the expectation is with respect to the realization of the measurement matrix A*^™) G 
and measurement vector ~ A/^(A^™^’^b, (7^1^), conditioned on b and the number of 

measurements received, m; 

4) Sensing optimization stage; in the Ith iteration, determine, Vvr, 


B 


l/W(ir) = max V ir(b)^ -At/'-Bcs+V' PM(m|i/>)V'j?(ir) >, 

the maximizer is the optimal sensing traffic in the Ith stage, '^[^l(7r); 

5) repeat from step 2) with /:=/ + ! until convergence; return policy 

Remark 3 The term Kn^(7r) in step 3) represents an evaluation of the cost-to-go function when 
m measurements are collected at the CC, and is independent of the scheme employed by the 
SUs to report their measurements. On the other hand, the term in step 4) evaluates the 


cost-to-go function under the specific reporting scheme, as described in Sec. III-B 


Importantly, Theorem [fallows us to relax the joint optimization of the sensing and seheduling 
actions and, instead, decouple it into two sub-stages: the first one, scheduling optimization stage, 
uses only the posterior belief information to determine the optimal SU data traffic; the second 
one, sensing optimization stage, uses only the prior belief information to determine the optimal 
SU sensing traffic. The proposed algorithm, thus, effectively captures the sequential structure 
of the decision making process, i.e., the prior belief drives the sensing traffic, which, in turn, 
determines the posterior belief, based on which the SU data traffic is scheduled, and so on. 
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Despite the complexity reduction obtained by decoupling the DP optimization into sub-stages, 
the DP algorithm has enormous complexity, due to the POMDP formulation and the huge action 
space. In particular, the prior and posterior beliefs vr and tt are defined over a 2^ dimensional 
space of all possible realizations of the PU spectrum occupancy state, leading to the curse 


of dimensionality. In Sec. VI the POMDP formulation is relaxed by projecting the prior and 
posterior beliefs on a lower dimensional manifold, thus leading to a compact belief representation. 

Additionally, the SU traffic r is defined over the set [0,1]^, leading to huge complexity in the 
scheduling optimization stage due to the huge action space. In Sec. |V-B[ we propose a partially 
myopic scheduling to relax this dimensionality issue. 


B. Partially Myopic scheduling scheme 

Let Afc= Ffc j be the total SU traffic budget in the scheduling phase in slot k. Then, we can 
decouple the scheduling policy r(7r) into the following sub-policies: a policy A(7r)G[0, F] which 
decides on the total traffic budget allocated as a function of tt, and a policy z(7r)GZ, which 
assigns the total budget to the different spectrum bands, where Z={z : ^Zj = l,z > 0}. We 
can thus rewrite the one-to-one mapping between r and (A, z) 


r(7r) = A(7r)z(7r). (38) 

Then, step 2) in the DP Algorithm can be replaced with 

(/['I (tt) = niax V 7r(b) {^T 5 (b, Az) + (1 - Az) 

A,z ^ ^ 
b 

-ActxA + E (n(7r, Az, p)) | b, Az] } , (39) 

thus yielding the optimal AW(7r) and z[*l(7r). Note that the optimization over Ag[ 0,F] can be 
carried out with complexity linear in F, since the total traffic budget A is a scalar quantity taking 
value in the closed set [0,F]. On the other hand, the optimization over z has high complexity 
since zeZ, and the action space Z grows exponentially with the number of frequency bands 


F. In order to reduce the complexity, using a similar approach as in [30|, we use a myopic 
approach to approximate z(7r, A) for a given total budget A, namely. 


z(7r, A) = argmax^ 7r(b) [^T 5 (b,Az) -f (1 -^)Tp(b,Az) - AcrxA], 


(40) 
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which corresponds to the instantaneous eost in the DP stage (37), without the eost-to-go term 


E [V^’' ^1]. Using (jy, (j^ and (21), we ean rewrite this optimization problem as 

F 


z(/3, A)= argmax (1—/3j)Azj 

i=l - 


i-ei-pp 


A 


e i-ps 

where we have defined the expeeted posterior oeeupaney veetor 


e s 


.t.^z, = 1, 


(41) 


/3 = ^b7r(b), (42) 

b 

and we have expressed z(/3, A) as a funetion of /3 only, rather than of the posterior belief tt. 

Additionally, we ean further bound the feasible values of the total traffie budget A as follows. 
Let rinax(/3) be the solution of the uneonstrained optimization problem 


rmax(/3)= arg max V 7r(b) [^Ts{h, r) + (1 - 0^p(b, r)] 

r>0 

b 


(43) 


= arg max 

r>0 


[^(i-A)(i-P5)ri + (i-OA(i-pp) 




0 (i-O(i-pp) 

1-0 ^(1-Ps) 


where we have defined [■]■•■ = max{-,0} and eomponent-wise operations. Note that rmax(/3) is 
the value of the SU traffie whieh maximizes the trade-off between the instantaneous PU and SU 
throughputs, as a funetion of the expeeted oeeupaney 0. If the SU traffie in the Ah speetrum 
band is sueh that rfc j(A), then the following undesirable outeomes oeeur: a smaller 

trade-off between PU and SU throughputs is aehieved, sinee rmax,i(A) optimizes sueh trade-off 
(see ([43])); a larger transmission eost is ineurred by the SUs in the seheduling phase; collisions to 
the PU operating in the Ah speetrum band are more likely to oeeur, so that the Ah speetrum band 
is more likely to be oeeupied in the next slot, due to the retransmission meehanism. Therefore, 
we restriet r(7rfc) to take values 0 < r(7ffc) < rjnax(A), so that 


A(7f) < y^rmax,i(/3) = Ai„ax(/3), 


(44) 


and, for a given A G [0, Amax(/3)], z < Henee, (41) is equivalent to 


P r 


z(/3, A)= arg max N 

7 . < ^ 


2=1 


(i-A)Az. + ^^A 




s.t. Zj=l, 0<z<- 


c(/3) 


A 


(45) 


The partially myopie seheme and the optimization problem (45) have the following properties. 
Theorem 2 1) The optimization problem (|?5|) is concave; 
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2) If for some i, then vf^, A) = 0. 

3) The SU traffic r(/3, A) = Az(/3, A) is a non-decreasing function of K (component-wise); 

4) for a given A G [0, Ainax(/3)], if > /3j for some i j, then rj(/3, A) < rj(/3, A); 


Proof: See Appendix B. ■ 

Property 3) states that, when the budget A inereases, the traffie seheduled in eaeh speetrum 
band does not deerease; Properties 2) and 4) state that more traffie is seheduled in those bands 
more likely to be idle, and no traffie is seheduled in those bands likely to be oeeupied by a PU. 
All these properties are desirable, sinee they ensure that the SU traffie is seheduled only to those 
bands more likely to be idle, thus minimizing the interferenee to the PUs. The implieation of 


Property 1) is that (45) ean be solved effieiently using standard eonvex optimization tools [311. 


While z(/3, A) is obtained myopieally as the solution of problem (45), the total traffie budget 
A(7r) is determined optimally as the solution of the DP stage 


UW(^) =max^^(b) |^T 5 (b, Az(/3, A)) + (1 - 07>(b, Az(/3, A)) - ActxA 


(46) 


+E 


^'(n(7r,Az(/3,A),p)) b,Az(/3,A) 


whieh replaees step 2) in the DP Algorithm and ean be solved with linear eomplexity, rather 
than exponential eomplexity as in the original DP step 2); henee the name partially myopic 
seheduling seheme, obtained by eombining an optimal DP solution of the total traffie budget 
with a myopic seheduling of the total traffie budget aeross speetrum bands. 

VI. Complexity reduction 


Although the dynamies of the speetrum bands evolve independently aeross frequeney, the 


eompressed speetrum measurements (23) introduee frequeney eorrelation, as is evident from the 


belief update (24). Therefore, in general, the information available at the CC is represented by 
a belief vr(b), whieh may not faetorize aeross frequeney bands, resulting in high dimensionality 
and huge optimization and operational eomplexity of the system. 

In this seetion, we propose a eompaet belief representation, whieh makes it possible to optimize 
and operate the system on a lower dimensional subspaee. In partieular, in See. VI-A[ we will 
resort to a eompaet belief representation via KLD minimization. Then, in See. VI-B[ we will show 
how this eompaet representation ean be exploited to design sparse reeovery teehniques to estimate 
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the spectrum occupancy. Finally, in Sec. VI-C we discuss the computation of the transition 
probabilities in the compact belief representation, which are required in the DP algorithm. 


A. Compact belief representation via KLD minimization 

In order to reduce the high dimensionality entailed by the POMDP formulation, we propose 
a compact state space representation by projecting the belief onto a low-dimensional manifold 
via KLD minimization. We approximate the belief vr(b) with the factorized model 


7r(b) ~ ir(b) = [1 - (47) 

i 

where G [0,1] with are low (L) and high (H) probability levels, and 0 : 

{1,2,...,F} I— )■ (L,if} is a function which maps the Ah spectrum band to indices corresponding 
to one of the levels or Note that this approximation assumes that the spectrum bands 
are statistically independent of each other, and that their probability of being occupied takes two 
possible values, or We can alternatively interpret the bands with high probability of 
occupancy as those detected to be occupied, so that Pfa = I — is the corresponding 
false-alarm probability. Similarly, the bands with low probability of occupancy are those 
detected to be idle, so that Pmd = is the corresponding missed-detection probability. 

The approximate belief 7r(b) is parameterized by ff)). We thus denote tt = 

The KLD between tt and tt is given by 

= D{'K\\n) = ^ 7r(b)ln 

be{o,i}^’ 

The goal is, given tt, to find parameters 0*)(7r) such that 



(0(^)*,0(^)*,0*)(7r)=argmin V{Tr, f) (48) 

,/?(«) ,0 

= argmax V [f3i In +(1-/3,) In , 

where we have used (47) and defined /3=E[bfc|7rfc=7r]. Theorem [^determines the solution of (48). 


Theorem 3 The solution of ( |4g| ) is given by 




L, i < u*(7r), 
H, i > u*{7i), 


(49) 
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where 

1 ^ - 1 ^ 

^= y:^ Y. / 3 ™(.). ( 50 ) 

2=1 2 = Z ^+1 

m : {1, 2,..., F} i—)■ {1, 2,..., F} Z5 a permutation of the entries of /3 in increasing order, i.e., 
such that f3m{i) < /3m(2) < ■ ■ ■ < f3m(F), cind i'*{n) solves 

z/*(7r) = argmin + (F - u)H 2 , (51) 

where H 2 {x)=—x\\i{x) — (1—a:) In (1—a;) is the binary entropy function. 


Proof: See Appendix C. ■ 

A suffieient statistie to represent the approximate belief tt is the compressed belief state 
(CBS) where z/g{1, 2, ..., F - 1} is the number of bands deteeted as idle, 

FpA=l—and Pmd=(3^^^ are the false-alarm and missed-deteetion probabilities for the bands 
deteeted as busy and idle, respeetively. In faet, any 0(-) whieh maps u speetrum bands to the low 
probability of oeeupaney and the remaining F—u speetrum bands to the high probability of 
oeeupaney ean be obtained by a proper permutation of the speetrum bands, whieh preserves 
the dynamies of the system, due to the symmetry of the speetrum bands aeross frequeney. Thus, 
the speeifie 0(-) needs not be taken into aeeount, but only the number of bands deteeted as idle, 
u. We denote the projeetion operator as s=V{f3) or s=V{7i) (used interehangeably). 

Therefore, given the prior belief TTk and posterior belief itk, we denote the prior CBS as 
Sfc = V{'Kk) and the posterior CBS as = F(7rfc), determined as in Theorem]^ We then define 
the poliey fk = fi^k) whieh maps the prior CBS to a value of the sensing traffie fk, and the 
poliey Afc = A(sfc), whieh maps the posterior CBS to a value of the total traffie budget A^. While 
the prior and posterior beliefs tt^ and are probability distributions over a spaee of size 2^ (all 
the possible realizations of the speetrum oeeupaney veetor b^), whieh seales exponentially with 
the speetrum size F, the CBS takes value from a low-dimensional spaee, whieh seales linearly 
with F. Therefore, f{sk) and A(sfc) ean be found with lower eomplexity than and A(7rfc). 

Despite the dimensionality reduetion aehieved by operating based on the CBS, eomputing 
TTfc = n(7rfc, Afc, Yk) and 0k in the sensing phase via (24) has exponential eomplexity. To aehieve 
eomplexity reduetion, we propose to deeouple the estimator from the CC, i.e., the estimator is 
treated as a blaek-box with input (yffc, Afc,yfc), whieh outputs a maximum-a-posteriori (MAP) 


estimate b 


(MAP) 


of bfc (See. VTB) and posterior false-alarm and missed-deteetion probabilities 
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for the bands detected as busy and idle, denoted as pFA,k and pMD,k, respectively. Given 
pFA,k and pMD,k, the CC approximates the posterior expected occupancy as 

Pk = - Pfam) + (1 - bf (52) 

from which the CBS i'k) is determined as = pMD,k, = 1 — pFA,k and 

h = F -J2i and the mapping function 0^ as ^k{i) = L = 0. 

B. Spectrum estimation via sparse recovery 
Given the prior /3fc = E[bfc|7rfc], solves 

hf^AP) = argmax P(bfc = b|/3fc, Afc,yfc) = argmin llyfc - A^bll^ + 2cr| ^b* In L_^ 
be{o,i}^" be{o,i}^ ■ V Hk,i 

where we have assumed the factorized prior distribution 



P(bt = b|A) = 


(53) 


In particular, letting — x(/3r; > 0.5) be the maximum-a-priori estimate, we can rewrite 


br->=b. 


irnap) 


5 {MAP) 


(54) 


where e 


{MAP) 


is the correction vector informed by the measurement matrix A^ and observation 


vector yfc. By plugging (54) into (53), this is given by the solution of the optimization problem 

2 


. {MAP) 


= argmin 
ee{0,l}-P' 


yk 


Ale 




(55) 


where we have defined 


^ y, - Alb'r’’\ M A (If - 2diag(bi’’“’’>)) Ar, (56) 

as the residual error from the prior estimate and the corrected measurement matrix, respectively, 
and pk is a Lagrangian multiplier column vector with components 

Pk, = 24 (1 - 2bi7^)) In • ( 57 ) 

Note that the Lagrangian vector pk weights the components of the error vector e based on 
their prior log-likelihood. As a result, each component e* may be weighted in a different way, 
according to its prior. Moreover, from the definition of prior estimate we have that 

Pk,i > 0, with equality if and only if /3fc,j = 0.5. 
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The optimization problem ( |55| ) has eombinatorial eomplexity, since the cost function needs 
to be evaluated for each e G {0,1}^. In order to overcome the combinatorial complexity, we 
propose the following convex li relaxation: 


Ofc = argmm 
ee[0,i]^ 


Yfc 






(58) 


i.e., the optimization is over the convex set [ 0 , 1 ]^, rather than the discrete one { 0 , 1 }^, and 
can thus be solved using convex optimization techniques pT| . In particular, it is a quadratic 
programming problem minimizing a least-squares term, plus an regularization term, which 
induces sparsity in the correction vector e^. The larger j {i.e., the closer f3k^i to 0 or 1), the 
sparser the solution. Note that is not feasible with respect to the original optimization problem 


(55). A feasible point is thus obtained by projecting into the discrete set {0,1}^ using, e.g., 
the minimum distance criterion xi^k > 0.5). This solution is not globally optimal with respect 

to ( [55] ), and can be improved using the following hill climbing algorithm [ [^ . 

Algorithm 2 (Hill climbing algorithm) 1) Initialization.- = xi^k > 0.5), counter I = 0; 

2) Improvement step.- at step I, compute the vector A^, with the ith component given by 


[^] _ 


= I - 1 


~2[Afcyfc]j + 2 'y '][AfcA 




'Ti. .aW 
kh,3'^k,j 


kkkl 


+ P'k;. 


Let i* = argmaxj ,- if > 0, determine a new MAP estimate as Vi 7 ^ i*, 


g[Ai] _ ^ 


kjj-,, update the counter I := 1 + 1 and repeat from the improvement step; otherwise, 


J^MAP) 
'k 


■'k ■ 


The term A[*' represents the increase or decrease in the MAP cost function ( |58[ ), by switching 
the ith component of the current MAP estimate, ef', from 1 to 0, or vice versa, and keeping all 
the other components unchanged. In particular, Af' is the difference in the cost function (58) 
between the old cost and the new one, so that AfJ > 0 if an improved estimate is obtained. 
By the definition of i*, if A^J > 0, by switching the i*th band of the current MAP estimate, 
the MAP cost function (58) is decreased by the amount AfJ, yielding an improved estimate. 
Otherwise (AfJ < 0), a local optimum has been determined by the algorithm, i.e., any change 
of one and only one component of the current MAP estimate is sub-optimal. 

C. CBS transition probabilities 

In order to run the DP algorithm based on the CBS, we need to determine the corresponding 


transition probabilities. Note that b 


(MAP) 


can be written as a function of the prior expected 
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occupancy f3k, which maps to the corresponding CBS = V{f3k), and of (Afc,yfc). We denote 
this function as = MP(/3fc, A^, y^). Similarly, the false-alarm and missed-detection 

probabilities can be written as functions of (/3fc, Afc,yfc), and thus are of difficult evaluation, due 
to their dependence on the measurements (Afc,yfc). 

Herein, we propose to marginalize the false-alarm and missed-detection probabilities with 
respect the measurements (Ak,yk), for a given value of the number of bands detected as idle, 
C>k = F — MPj(/3fc, Afc, yfc), the number of measurements received, M^, and the CBS s^. 
The rationale is that the detection performance of the MAP estimator is mainly driven by the 
number of measurements collected at the CC, rather than the specific observations {Ak,yk)- 
Equivalently, 


Pfa(s, m, z>)=E 


PMD{s,m, z>)=E 



(s/j, Mk, f'fe) 

PpA^k 



= (s,m, z>) 

T~) 

(Sfc, Mk, 

^MD,k 



= (s,m,z>) 


=E 

=E 


E.(l-bM)MP.(A,A<’">,yf>) 

(S/^, Mkt 

F-v 

= (s,m, z>) 



(S/j., M^k") ^k) 

0 

= (s,m, z>) 


, (59) 

,(60) 


where jSk maps to the CBS s = V{f3k), up to a random permutation of its entries. The 
expectation is taken with respect to the realization of the measurement matrix A^™) G 
the measurement vector (of size m), and the random permutation of the entries of /3fc. 

Let P,^(z>|s, m)=P(z>fc=z>|sfc=s, Mfc=m) be the pmf of the number of bands detected as idle, 
given the prior CBS s and the number of measurements received m, after marginalization with 
respect to (Afc,yfc). This is given by 


P^{i>\s, m) = E 

x(f-Y. MP*(A. AC.yfPP ) 

(Sfc, Mk) 


V . / 

= (s, m) 


We define a neighborhood around the (prior or posterior) CBS v) as 


(61) 


={s = {x,y,u) : X e + S],y e + <^]} • (62) 


The transition probability from the prior CBS Sfc=s to the posterior CBS G ^^(s) is given by 


P(sfc G 55(s)|sfe = s,il)k = ip) = ^ PM{rn\il))P^{i)\s,m) 


(63) 


m=0 


X X (A^a(s, m, z>) G [1 - -6,1- + 6]) x f ArD(s, m, z>) G - 6, -f 6] 
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Given the posterior expeeted oeeupaney /3fc, the SU traffie r^, and the feedbaek p^, the prior 
expeeted oeeupaney in the next slot, (3k+i, is given by 


_ FB|p(l|0,r^,Pi)Pp(pj|0,r^)(l - $k,i) + Pb|p( 1|1, r^, p^)Pp(pi|l, r^)/3fc,i 
Fp(pi|0, ri)(l - /3fc,i) + Fp(pi|l, ri)^k,i 


(64) 


where Fb|p(-) and Fp(-) are defined in (29)-(30) and (25)-(26), so that we ean write f3k+i = 
/3(/3a:, rfc, Pfc) for a proper funetion /3(-). This, in turn, maps to the prior CBS s^+i = V{f3k+i). 
Therefore, for a given total traffie budget Ak = A, the transition probability from the posterior 
CBS Sfc = s = v) to the prior CBS s^+i G ^^(s) in the next slot, is given by 


(sfc+i G ‘55(s)|sfc — s, Afc — A) 


(65) 


b,pe{o,i}^ * 

where we have marginalized with respeet to bfc and p^, and /3 is given by /3j = < 0, 

/3j = > u, so that F(/3) = s. 

These probabilities, along with (59), ( [60| ) and ( [M] ), do not admit a elosed form analytieal 
expression, but ean be eomputed numerieally via Monte-Carlo simulation. 

VII. Numerical Results 


In this seetion, we present numerieal results for a system with parameters: number of fre- 
queney bands F=20; SU and PU failure probabilities ps=pp=0.1; probability of new PU arrival 
C=0.095; probability of a new data paeket for an aetive PU 0=0.95; number of SUs Ns=l00; 
number of eontrol ehannels for the SUs B=5; SU sensing and data transmission eosts cs=ctx=^', 
varianee of the entries of the measurement matrix cr^=l; varianee of the measurement noise 
ct|= 1/20; erasure probability e=0.9. The performanee is evaluated over 2 x 10^ slots. 

Fig. |4|a plots the trade-off between the PU and SU throughputs, for different values of the 
total eost C (aeeounting for both the eost of sensing and of data transmission). Fig. |^b plots 
the fraetion of the total eost C that is spent for speetrum sensing {Caensing), as a funetion of the 
PU throughput and total eost C. Note that the remaining fraetion of the total eost is spent for 
aetual data transmission. We notiee that the throughput trade-off improves for higher values of 
the eost C. This is expeeted sinee, when more resourees are available (higher C), there are more 
opportunities to perform data transmission for the SUs. At the same time, for higher values of 
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(a) Trade-off between PU and SU throughput for different (b) Fraction of the total cost C spent for spectrum sensing 
values of the cost constraint C (accounting both sensing and (Csensing), for different values of the PU throughput and 
data transmission costs). total cost C. 

Figure 4. 

the cost C, a larger fraction of this cost is spent for spectrum sensing. The reason is that, in 
order to accommodate more traffic for the SUs, with minimal interference to the PUs, the SUs 
need to acquire more accurate spectrum estimates, hence the sensing cost increases accordingly. 
Additionally, when the requirement on the throughput degradation to the PUs is very strict (Tp 
approaching the maximal value), most of the resources are spent for spectrum sensing. This is 
because, in order to meet the demanding requirement on the throughput degradation to the PU, 
the SU traffic should be allocated only on those spectrum bands which are idle almost surely. 
Such low level of uncertainty, in turn, demands significant sensing resources. 

Fig. Da plots the total traffic allocated, A^, as a function of the expected number of occupied 
spectrum bands, ^ • 0k,h for the case A = 0.025 and ^ = 0.7. Each sample corresponds to a given 
value of the posterior belief 0k, lying on the low-dimensional manifold generated by the CBS. 
We notice that, as the expected number of occupied spectrum bands increases, the total traffic 
allocated tends to decrease. In fact, when more spectrum bands are expected to be occupied, 
there are fewer opportunities to occupy the remaining idle bands by the SUs. Fig. |^b plots the 
SU sensing traffic per channel, ipk, and the expected number of measurements received at CC, 
E(Mfc), as a function of the entropy of the prior belief state, where f3k lies on 

the low-dimensional manifold generated by the CBS. We notice that, as the entropy increases, 
i.e., the amount of uncertainty on the current spectrum occupancy increases, the SU sensing 
traffic also increases, and thus, more measurements are collected at the CC in order to reduce 
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Expected number of spectrum bands occupied by PUs 


2 
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measurements received, E[Mk] 


' oit* * *^%**^* 
%#* 
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(a) Total traffic allocated Ak as a function of the expected (b) SU sensing traffic per channel tpk and expected number 
number of occupied spectrum bands, f3k,i- of measurements received at CC E(Mfc) as a function of the 

entropy of the prior belief state, H2{(3k,i)- 

Figure 5. 

this uncertainty. The sensing resourees are foeused in those regions of the belief state where the 
speetrum oeeupaney state is more uneertain, yielding energy effieient resouree utilization. 

VIII. Conclusions 

In this paper, we have presented a eross-layer framework for joint distributed speetrum sensing, 
estimation and seheduling in a wireless network eomposed of SUs that opportunistieally aeeess 
portions of the speetrum left unused by a lieensed network of PUs. In eontrast to mueh prior 
work, we jointly address sensing and eontrol, wherein the sensing affeets the quality of the 
measurements. Inferenee of the underlying speetrum oeeupaney state is obtained by eolleeting 
eompressed measurements at the CC from nearby SUs, and via loeal ACK/NACK feedbaek 
information from the PUs. In order to reduee the huge optimization and operational complexity 
due to the POMDP formulation, we have proposed a teehnique to projeet the belief state 
onto a low-dimensional manifold via the minimization of the Kullbaek-Leibler divergenee. We 
have proved the optimality of a two-stage deeomposition, whieh enables the decoupling of 
the optimization of sensing and seheduling. Additionally, we have proposed a partially myopie 
optimization seheme, whieh ean be solved effieiently using eonvex optimization tools. Simulation 
results demonstrate how the proposed framework optimally balanees the eost of aequisition of 
state information via distributed speetrum sensing and the eost of data transmission ineurred by 
the SUs, while aehieving the best trade-off between PU and SU throughput under the resouree 
eonstraints available. 
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Appendix A: Proof of Theorem [T] 

The instantaneous expeeted sensing eost (see ([3T])) is given by ipkBcs, and is thus independent 
of Tik, given ip^.. Moreover, the distribution of given T-Lk is given by P(bfc = hlT-Lk) = 7rfc(b). 
Therefore, the prior belief Tik is a suffieient statistic to select ipk in slot k 0. 

After selecting ipk and collecting (Afc,yfc), the CC computes the posterior belief kk = 
ri(7rfc, Afc,yfc) as in (24). In the scheduling phase, given the SU traffic and the history l-Lk, 
the PU feedback has probability distribution 

P (pfc=p| rfc, "Hfe) =y^P (pfc=p| bfc=b, Ffc, P (^bfc=b| r^, Hk 
be{o,i}^’ 

= '^ JJ Fp(pi|bi, rfc,i)7rfc(b) = P (p^ = p| r^, ifk ), 
be{0,i}^ i 


where we have used (27) and the definition of Tr^., thus the distribution is independent of T-Lk given 
(rfc, TTfc). Therefore, the next prior belief = n(7rfc, r^, p^) is statistically independent of T-Lk, 


given (rfcjTTfc). The instantaneous expected data transmission cost (32), and SU/PU throughputs 


(21) given {vk,l-ik), are given by 


E 

E 


CtX 




2 = 1 




— Ctx 




( 66 ) 


2=1 


Fx(bfc,rfc)| rfc,Ffc =y^P (^hk = b| rk,'Hk^ Tx(h,rk) ^fc(b)Tx(b, rfc). 

be{o,i}-p be{o,i}-p 


All these metrics of interest are functions of and tt^ only. Therefore, the posterior belief state 
fck is a sufficient statistic to schedule the traffic in slot k 0- 

Appendix B: Proof of TheoremI^ 

Proof of Property 2) From (43), if /3j > - 


_ (i-OCi-ppl+^Ci-ps) ‘ 

Proof of Property 1) Consider the optimization problem 

F 


max 


2=1 




1-ei-pp 
i 1-P5 


A 


e s 


.t. = A, 


obtained by replacing r=Az in (45). Let i such that A< 7 yr 


g(l-Ps) 


second derivative of the objective function with respect to rj is given by 

g{Yi) = (1-A)(ri - 2) + 

L i 1-Ps 


, hence r 


max ,2 


(67) 
>0. The 

( 68 ) 
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Since r* < rmax,i, we then obtain 


gi^i) < fi'(rmax,i(/3)) < -(l-A)e < 0. 


(69) 


Therefore, the objeetive funetion in (67) is a eoneave funetion of r. Sinee the eonstraint set 


{0 < r < Frnax, Fj = A} is eonvex, the resulting optimization problem (p7|) is eonvex. 


Proof of Property 3) We denote the maximizer of (67) as r*(A), whieh obeys 0 < r*(A) < Fj, 


Solving (67) with the Lagrange multiplier method, we obtain maxo<r<rmax a)> where 

F 


/(la) =5^ 


2=1 




+ /i ^ Fi - A , 


(70) 


/ 1-Ps 

whose maximizer is denoted as r{p). The optimal Lagrange multiplier p*{A) must be sueh that 

5^f,(A(A))=A, (71) 


yielding r*(A) = r(/i*(A)). We now solve the Lagrangian problem (70) for a speeifie p. Sinee 
the objeetive funetion is a eoneave funetion of r : 0 < r < Fmax. we have the following oases: 


a) If 


d/(r,/j) 


b) If 


dri 




ri=0 


< 0, then rfp) = 0. Equivalently, 


dri 


-(i-ft) + LAr-^A>« 

/ 1-ps 

> 0, then fi(/i) = Fmax.i- Equivalently, 


(72) 


L i—^ max,t 


(1-A)(r^i,.,i - 1) + 

^ 1-Ps . 

o) Otherwise, rfp) is the only r^ G [0,rmax,i] sueh that 

(i-A)(l-i) + ^^A 


g *^max,z 


^ < a; 


e 1-As 


e + /r = 0. 


(73) 


(74) 


rfp) is a non-deereasing funetion of p sinee, by inoreasing p, the inequality in (72) beeomes 


tighter and the inequality in (73) beeomes looser, and the left-hand expression of (74) is a 
deoreasing funetion of r^. Henee, p*{A) is a non-deoreasing funetion of A, so that, for Ai > A 2 , 


r*(Ai) = f(A*(Ai)) > f(A*(A 2 )) = r*(A 2 ). 


(75) 


Property 3) is thus proved. 


Proof of Property 4) Eet z* be the optimizer of (45) and let A > A some i j. Assume 
by eontradietion that z* > z*. Now, we define a new SU traffie z as follows: 


zi = z*i, If. {i,j}, Zi = z*, Zj = z*. 


(76) 
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Equivalently, the SU traffie alloeated to the ith and jth bands under z* is switehed under the 
new traffie seheme z. Note that ^ • Zj = ^ • z* = 1, so that z obeys the total traffie eonstraint, 


and is thus feasible with respeet to (45). Let v{z) be the value of the objeetive funetion in (45) 


as a funetion of z. Due to the optimality of z*, we have that v{z) — v{z*) < 0. We show that 
this eannot hold, henee proving the eontradietion. We have 


v{i)-v{z*) = (A- 4 )[ 7 (zp - 7 (z*)], 
where we have defined, for z G [ 0 , 1 ], 


7^) = 


-Az + 


1-ei-pp 


,-A2 


(77) 


(78) 


C 1-Ps. 

Note that 7 ( 11 ) is a deereasing funetion of n G [0,1]. Therefore, sinee z* > z*, we have that 
7 (z*) < 7 (z*), henee v{z) > v{z*), yielding a eontradietion. 

Appendix C: Proof of Theorem[3] 


The optimization problem (48) ean be deeomposed into the following two stages. First, given 
with determine the mapping funetion 0 (-) sueh that 


= argmin V{Tr,cj)) 

<P 

= argmax E [A In + (1 - A) In (1 - . 


(79) 


Seeond, determine and with optimal mapping into (48), yielding 


= argmin V{7i, (3^^\ 


The solution to the intermediate problem (79) is trivially given by 

yii) = L ^ A In +(1 - A) In (1 - >A In +(1 - A) In (l - , (80) 

yielding 

In - In 


-1 


0(i) = L<^ (3i< 1 + 


( 81 ) 


In (1 - /)W) - In (1 - /3W) 

Note that the solution is of threshold type. Therefore, defining the permutation funetion m(-) as 
in the statement of the theorem, there exists u sueh that = L for i < u, and = H 
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for i > u. We can thus restate the optimization problem ( [4^ by enforcing this solution, yielding 

V 

=argmax V] In (82) 


+ [/3™(.)ln(^(^)) + (l-/3^(,))ln(l-^(^))], 


2 = Z /+1 


so that = L ^ m(i) < v*. We solve (82) with respect to first, for a fixed u, 

and then optimize over u. We obtain 

ly 1 ^ 

= argmaxy] In + (1 - /3m(i)) In (l - (83) 


2 = 1 

/3W(i/)=argmaxy In +(1 - /3m(i))ln (l - (84) 


i=v-\-l 


yielding (50). By replacing , I3^^\v)) into (82), we finally obtain v* as in (51). 
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